Skip to content

Conversation

@tsbinns
Copy link
Contributor

@tsbinns tsbinns commented Feb 10, 2025

Reference issue (if any)

Fixes #13108

What does this implement/fix?

When reading annotations from a csv file, truncates nanoseconds from orig_time by converting to the microsecond format. The exception is for the default time of 1970-01-01 00:00:00 used in the files when Annotations.orig_time=None, as converting this to microseconds would cause it to be read as a proper time rather than being converted to None which is intended.

Also truncates nanoseconds from the times in utils.dataframes._convert_times() when time_format="datetime" by converting to dtype=datetime64[us].
Should this also be done when time_format="timedelta"?

Finally, updates unit tests that check csv files with nanosecond times get assigned proper orig_time when read in and checks that Annotations.to_data_frame() returns times with nanoseconds truncated (which in turn means truncated times are saved).

@tsbinns
Copy link
Contributor Author

tsbinns commented Feb 10, 2025

Also updated the docstring for Annotations.orig_time and added a RuntimeWarning when a string is provided but it gets converted to None. Thoughts on this?

@tsbinns
Copy link
Contributor Author

tsbinns commented Feb 10, 2025

The code for annotations works fine, but changing _convert_times() has some knock on effects for converting data objects to dataframes.

This is an alternative approach to fix the current errors:

elif time_format == "datetime":
    # make ISO8601 (microsecond) compatible
    times = (to_timedelta(times + first_time, unit="s") + meas_date).to_numpy()
    times = DatetimeIndex(
        [time.replace(nanosecond=0) for time in times], tz=times[0].tz
    )

but I'm not 100% on how others want to proceed so I'll pause for now.

@tsbinns
Copy link
Contributor Author

tsbinns commented Mar 10, 2025

Hey @drammock, please could I get your input on the proposed changes. The original annotations issue is resolved, but there are some implementation details that weren't discussed in the issue that I'd like to get suggestions on.
That would also help with deciding how I can resolve the issues with dataframe conversions.

@tsbinns
Copy link
Contributor Author

tsbinns commented Mar 10, 2025

The code for annotations works fine, but changing _convert_times() has some knock on effects for converting data objects to dataframes.

This is an alternative approach to fix the current errors:

elif time_format == "datetime":
    # make ISO8601 (microsecond) compatible
    times = (to_timedelta(times + first_time, unit="s") + meas_date).to_numpy()
    times = DatetimeIndex(
        [time.replace(nanosecond=0) for time in times], tz=times[0].tz
    )

but I'm not 100% on how others want to proceed so I'll pause for now.

Turns out this doesn't fully resolve things. Once the proposed changes to _convert_times() are better established I can tackle this properly.

@drammock drammock added this to the 1.11 milestone Sep 12, 2025
Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsbinns do you still need feedback on the correct change here? One way forward would be to pick whichever you think would be best, get tests green, and we can see what the impact is...

@tsbinns
Copy link
Contributor Author

tsbinns commented Nov 5, 2025

@larsoner I'll have a look through this again tomorrow and refamiliarise myself/start trying to fix the failures. Then will see if I have some more concrete questions then!

@tsbinns
Copy link
Contributor Author

tsbinns commented Nov 6, 2025

Okay, so everything is green now.

This PR changes it so that nanoseconds in times will be dropped on reading annotations. As noted here (#13108 (comment)), sanitising on read will prevent issues with annotations created with MNE v1.7-1.10, as well as any created moving forward, so I think this could be considered a sufficient solution already.

Still, that comment also mentions sanitising annotations on write by adapting mne.utils.dataframe._convert_times() (which is called when saving annotations) to limit this to microsecond format datetimes. However, _convert_times() is used for dataframe conversion for more than just annotations, so changing it has some knock-on effects, e.g., for Raw -> DataFrame conversion. I have removed the changes that were causing problems, but I still think an argument can be made for sanitising on write.

For example, if I create annotations on MNE 1.11 (when it's released), I might want to share these annotations with colleagues who are using different environments and different MNE versions. If they are using MNE < 1.11, they will encounter the same issues reading these annotations, which would be avoided if we sanitise annotations on write.

Rather than altering _convert_times() and dealing with the wider package implications, we could just have the sanitisation take place in Annotations.to_data_frame(), something like:

    def to_data_frame(self, time_format="datetime"):
        """Export annotations in tabular structure as a pandas DataFrame.
            ...
        """
        pd = _check_pandas_installed(strict=True)
        valid_time_formats = ["ms", "timedelta", "datetime"]
        dt = _handle_meas_date(self.orig_time)
        if dt is None:
            dt = _handle_meas_date(0)
        time_format = _check_time_format(time_format, valid_time_formats, dt)
        dt = dt.replace(tzinfo=None)
        times = _convert_times(self.onset, time_format, dt)
        # ↓↓↓ New code
        if time_format == "datetime":
            # remove nanoseconds
            tz_name = (
                f", {meas_date.tzinfo.tzname(meas_date)}" if meas_date is not None else ""
            )
            times = times.astype(f"datetime64[us{tz_name}]")
        # -------------------
        df = dict(onset=times, duration=self.duration, description=self.description)
        if self._any_ch_names():
            df.update(ch_names=self.ch_names)
        df = pd.DataFrame(df)
        extras_df = pd.DataFrame(self.extras)
        df = pd.concat([df, extras_df], axis=1)
        return df

Alternatively, if people feel datetimes for all dataframes should not contain nanoseconds, I am happy to discuss this, but I feel those changes go beyond the scope of this PR.

WDYT @drammock @larsoner?

@larsoner
Copy link
Member

larsoner commented Nov 6, 2025

Rather than altering _convert_times() and dealing with the wider package implications

Have you considered

def _convert_times(..., *, drop_nano=False):
    ...
    if drop_nano:
        ...

and then only in the places you need it for annots you call with drop_nano=True? Seems like this would make it so we only drop nanoseconds when needed

@tsbinns
Copy link
Contributor Author

tsbinns commented Nov 6, 2025

Yeah, I can do that.
But at the moment we would only set drop_nano=True for Annotations -> DataFrame conversions at this time, correct?

@larsoner
Copy link
Member

larsoner commented Nov 6, 2025

Yeah I think so. Though admittedly I do not have my head 100% wrapped around the issue, so happy to go with what you think makes sense @tsbinns , I think you have given this issue (much) more deliberate consideration!

@tsbinns
Copy link
Contributor Author

tsbinns commented Nov 6, 2025

Okay. I haven't seen any issues reported with nanoseconds in dataframe conversions of other classes, so I'll keep it simple and stick with enabling this only for annotations!

@tsbinns
Copy link
Contributor Author

tsbinns commented Nov 6, 2025

Just pushed a commit that drops nanoseconds only for Annotations -> DataFrame conversions. Should be everything sorted once tests are green.

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of tiny comments, @drammock did you want to look as well?

@tsbinns
Copy link
Contributor Author

tsbinns commented Nov 6, 2025

That's @larsoner's comments addressed.

Here's an example of what the warning about bad orig_time as input to Annotations looks like now:

RuntimeWarning: The format of the `orig_time` string is not
recognised. It must conform to the ISO8601 format with at most
microsecond precision and where the delimiter between date and
time is ' '. Got: 2002-12-03 19:01:11.720100000. Defaulting
`orig_time` to None.

@drammock drammock enabled auto-merge (squash) November 6, 2025 17:44
@drammock drammock merged commit b1340eb into mne-tools:main Nov 6, 2025
32 checks passed
@tsbinns tsbinns deleted the fix_annotations_orig_time branch November 7, 2025 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Annotations.orig_time not properly read from csv files

3 participants